12 research outputs found

    New suite of Concept Profile Analysis Web Services

    No full text
    <p>This zipfile contains the source code for the Concept Profile Mining Web services.<br> The directory 'erasmusmc_maven_dependencies' contains copies of the erasmus-mc jar files that are hosted at the DTL nexus repository.</p> <p> </p

    The Implicitome: A Resource for Rationalizing Gene-Disease Associations

    No full text
    <div><p>High-throughput experimental methods such as medical sequencing and genome-wide association studies (GWAS) identify increasingly large numbers of potential relations between genetic variants and diseases. Both biological complexity (millions of potential gene-disease associations) and the accelerating rate of data production necessitate computational approaches to prioritize and rationalize potential gene-disease relations. Here, we use concept profile technology to expose from the biomedical literature both explicitly stated gene-disease relations (the explicitome) and a much larger set of implied gene-disease associations (the implicitome). Implicit relations are largely unknown to, or are even unintended by the original authors, but they vastly extend the reach of existing biomedical knowledge for identification and interpretation of gene-disease associations. The implicitome can be used in conjunction with experimental data resources to rationalize both known and novel associations. We demonstrate the usefulness of the implicitome by rationalizing known and novel gene-disease associations, including those from GWAS. To facilitate the re-use of implicit gene-disease associations, we publish our data in compliance with FAIR Data Publishing recommendations [<a href="https://www.force11.org/group/fairgroup" target="_blank">https://www.force11.org/group/fairgroup</a>] using nanopublications. An online tool (<a href="http://knowledge.bio" target="_blank">http://knowledge.bio</a>) is available to explore established and potential gene-disease associations in the context of other biomedical relations.</p></div

    Correction of literature bias in the match score.

    No full text
    <p><b>a,b</b>) Distribution of genes and diseases recognized by LWAS when sorted by publication abundance (log number of MEDLINE abstracts). Red lines indicate the 5-abstract cut-off, below which concept profiles are not constructed. <b>c,d)</b> Distribution of gene and disease rank orders, binned in 10 percentile intervals (x-axis). Higher numbers indicating stronger associations (y-axis).</p

    Gene-Disease LWAS using concept profiles and networks of implicit information.

    No full text
    <p><b>a)</b> Concepts X and Z share an association in a hypothetical concept network via an explicit link (co-occurrence) and multiple implicit links (indirect connections via an intermediate concept, Y1, Y2, and Y3). The concept profile for concept X is depicted where the weights (w) between concepts reflect the co-occurrence frequencies of each concept in the data source. <b>b)</b> Concept profiles for concepts X and Z have explicit links to concepts Y1, Y2, and Y3 but no explicit link between themselves, as reflected in their corresponding concept profiles. <b>c)</b> The intermediate shared concepts between concept profiles X and Z constitute implicit information, indirectly linking X and Z (red dotted line). The strength of the implicit link (match score) is computed as the inner product of the weights of matching concepts in the concept profiles. <b>d & e)</b> The distribution of concept profile size for gene (median 1142, maximum 56,028) and disease (median 995, maximum 81,562) concepts. <b>f)</b> The distribution of number of overlapping concepts between gene and disease concept profiles (median 180, maximum overlap 40,725). Only 23 concept pairs had no overlapping concepts. <b>g)</b> Concept profiles for the human gene <i>CWH43</i> (left) and the disease “Hyperphosphatesia with Mental Retardation” (right) which share no explicit co-occurrence. The 37 overlapping concepts are shown clustered in between. Both the number and weights of these overlapping links contribute to the strength of the implicit association. <b>h)</b> The distribution of match scores (higher numbers indicating stronger associations) for the 204 million LWAS-derived gene-disease pairs for both the explicit (black) and implicit (red) associations.</p

    The relative distribution of LWAS association types.

    No full text
    <p>Distribution of the top 105 highest-ranking implicit gene-disease pairs determined by manual inspection: <i>Type I Gene family member</i> (n = 71) represents gene-disease associations where a family member of the gene is causing the disease or a disease with very large phenotypic overlap; <i>Type II Negation</i> (n = 4) and <i>Type III Homonym</i> (n = 11) represent different classes of LWAS false positives composing 14% of the cases. <i>Type IV Novel association</i> (n = 19) indicates gene-disease associations of promise for follow up investigations.</p
    corecore